{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
    "Licensed under the MIT License.\n",
    "\n",
    "# Inference Bert Model for High Performance with ONNX Runtime on AzureML #\n",
    "\n",
    "This tutorial includes how to pretrain and finetune Bert models using AzureML, convert it to ONNX, and then deploy the ONNX model with ONNX Runtime through Azure ML. In the following sections, we are going to use the Bert model trained with Stanford Question Answering Dataset (SQuAD) dataset as an example. Bert SQuAD model is used in question answering scenarios, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.\n",
    "\n",
    "## Roadmap\n",
    "\n",
    "0. **Prerequisites** to set up your Azure ML work environments.\n",
    "1. **Pre-train, finetune and export Bert model** from other framework using Azure ML.\n",
    "2. **Deploy Bert model using ONNX Runtime and AzureML**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 0 - Prerequisites\n",
    "If you are using an [Azure Machine Learning Notebook VM](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-run-cloud-notebook), you are all set. Otherwise, refer to the [configuration Notebook](https://github.com/Azure/MachineLearningNotebooks/blob/56e0ebc5acb9614fac51d8b98ede5acee8003820/configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace. Prerequisites are:\n",
    "* Azure subscription\n",
    "* Azure Machine Learning Workspace\n",
    "* Azure Machine Learning SDK\n",
    "\n",
    "Also to make the best use of your time, make sure you have done the following:\n",
    "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
    "* [Azure Portal](https://portal.azure.com) allows you to track the status of your deployments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 - Pretrain, Finetune and Export Bert Model (PyTorch)\n",
    "\n",
    "If you'd like to pre-train and finetune a Bert model from scratch, follow the instructions in [\n",
    "Pretraining of the BERT model](https://github.com/microsoft/AzureML-BERT/blob/master/pretrain/PyTorch/notebooks/BERT_Pretrain.ipynb) to pretrain a Bert model in PyTorch using AzureML. Once you have the pretrained model, refer to [AzureML Bert Eval Squad](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_SQUAD.ipynb) or [AzureML Bert Eval GLUE](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_GLUE.ipynb) to finetune your model with your desired dataset. Follow the tutorials all the way through **Create a PyTorch estimator for fine-tuning**. Before creating a Pytorch estimator, we need to prepare an entry file that trains and exports the PyTorch model together. Make sure the entry file has the following code to create an ONNX file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_model_path = \"bert_azureml_large_uncased.onnx\"\n",
    "\n",
    "# set the model to inference mode\n",
    "# It is important to call torch_model.eval() or torch_model.train(False) before exporting the model, \n",
    "# to turn the model to inference mode. This is required since operators like dropout or batchnorm \n",
    "# behave differently in inference and training mode.\n",
    "model.eval()\n",
    "\n",
    "# Generate dummy inputs to the model. Adjust if necessary.\n",
    "inputs = {\n",
    "        'input_ids':   torch.randint(32, [2, 32], dtype=torch.long).to(device), # list of numerical ids for the tokenised text\n",
    "        'attention_mask': torch.ones([2, 32], dtype=torch.long).to(device),        # dummy list of ones\n",
    "        'token_type_ids':  torch.ones([2, 32], dtype=torch.long).to(device),        # dummy list of ones\n",
    "    }\n",
    "\n",
    "symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}\n",
    "torch.onnx.export(model,                                        # model being run\n",
    "                  (inputs['input_ids'], \n",
    "                   inputs['attention_mask'], \n",
    "                   inputs['token_type_ids']),                   # model input (or a tuple for multiple inputs)\n",
    "                  output_model_path,                            # where to save the model (can be a file or file-like object)\n",
    "                  opset_version=11,                             # the ONNX version to export the model to\n",
    "                  do_constant_folding=True,                     # whether to execute constant folding for optimization\n",
    "                  input_names=['input_ids', \n",
    "                               'input_mask', \n",
    "                               'segment_ids'],                   # the model's input names\n",
    "                  output_names=['start', \"end\"],                 # the model's output names\n",
    "                  dynamic_axes={'input_ids': symbolic_names,              \n",
    "                                'input_mask' : symbolic_names,\n",
    "                                'segment_ids' : symbolic_names,\n",
    "                                'start' : symbolic_names, \n",
    "                                'end': symbolic_names})     # variable length axes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this directory, a `run_squad_azureml.py` containing the above code is available for use. Copy the training script `run_squad_azureml.py` to your `project_root` (defined at an earlier step in [AzureML Bert Eval Squad](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_SQUAD.ipynb))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "shutil.copy('run_squad_azureml.py', project_root)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you may continue to follow the **Create a PyTorch estimator for fine-tuning** section in [AzureML Bert Eval Squad](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_SQUAD.ipynb). In creating the estimator, change `entry_script` parameter to point to the `run_squad_azureml.py` we just copied as noted in the following code. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator = PyTorch(source_directory=project_roots, \n",
    "                    script_params={'--output-dir': './outputs'},\n",
    "                    compute_target=gpu_compute_target,\n",
    "                    use_docker=True,\n",
    "                    custom_docker_image=image_name,\n",
    "                    script_params = {...},\n",
    "                    entry_script='run_squad_azureml.py', # change here\n",
    "                    node_count=1,\n",
    "                    process_count_per_node=4,\n",
    "                    distributed_backend='mpi',\n",
    "                    use_gpu=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Follow the rest of the [AzureML Bert Eval Squad](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_SQUAD.ipynb) to run and export your model. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 - Deploy Bert model with ONNX Runtime through AzureML\n",
    "\n",
    "In Step 1 and 2, we have prepared an optimized ONNX Bert model and now we can deploy this model as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
    "\n",
    "We're now going to deploy our ONNX model on Azure ML using the following steps.\n",
    "\n",
    "1. **Register our model** in our Azure Machine Learning workspace\n",
    "2. **Write a scoring file** to evaluate our model with ONNX Runtime\n",
    "3. **Write environment file** for our Docker container image.\n",
    "4. **Deploy to the cloud** using an Azure Container Instances VM and use it to make predictions using ONNX Runtime Python APIs\n",
    "5. **Classify sample text input** so we can explore inference with our deployed service.\n",
    "\n",
    "![End-to-end pipeline with ONNX Runtime](https://raw.githubusercontent.com/vinitra/models/gtc-demo/gtc-demo/E2EPicture.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2.0 - Check your AzureML environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check core SDK version number\n",
    "import azureml.core\n",
    "from PIL import Image, ImageDraw, ImageFont\n",
    "import json\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "print(\"SDK version:\", azureml.core.VERSION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load your Azure ML workspace\n",
    "\n",
    "We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core import Workspace\n",
    "\n",
    "ws = Workspace.from_config()\n",
    "print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2.1 - Register your model with Azure ML\n",
    "\n",
    "Now we upload the model and register it in the workspace. In the following tutorial. we use the bert SQuAD model outputted from Step 1 as an example. \n",
    "\n",
    "You can also register the model from your run to your workspace. The model_path parameter takes in the relative path on the remote VM to the model file in your outputs directory. You can then deploy this registered model as a web service through the AML SDK."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = run.register_model(model_path = \"./bert_azureml_large_uncased.onnx\", # Name of the registered model in your workspace.\n",
    "                           model_name = \"bert-squad-large-uncased\", # Local ONNX model to upload and register as a model\n",
    "                           model_framework=Model.Framework.ONNX , # Framework used to create the model.\n",
    "                           model_framework_version='1.6', # Version of ONNX used to create the model.\n",
    "                           tags = {\"onnx\": \"demo\"},\n",
    "                           description = \"Bert-large-uncased squad model exported from PyTorch\",\n",
    "                           workspace = ws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, if you're working on a local model and want to deploy it to AzureML, upload your model to the same directory as this notebook and register it with `Model.register()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.model import Model\n",
    "\n",
    "model = Model.register(model_path = \"./bert_azureml_large_uncased.onnx\", # Name of the registered model in your workspace.\n",
    "                       model_name = \"bert-squad-large-uncased\", # Local ONNX model to upload and register as a model\n",
    "                       model_framework=Model.Framework.ONNX , # Framework used to create the model.\n",
    "                       model_framework_version='1.6', # Version of ONNX used to create the model.\n",
    "                       tags = {\"onnx\": \"demo\"},\n",
    "                       description = \"Bert-large-uncased squad model exported from PyTorch\",\n",
    "                       workspace = ws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Displaying your registered models\n",
    "\n",
    "You can optionally list out all the models that you have registered in this workspace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "models = ws.models\n",
    "for name, m in models.items():\n",
    "    print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)\n",
    "    \n",
    "#     # If you'd like to delete the models from workspace\n",
    "#     model_to_delete = Model(ws, name)\n",
    "#     model_to_delete.delete()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2.2 - Write scoring file\n",
    "\n",
    "We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object. Then the `run()` function is called when we run the model using the Azure ML web service. Add necessary `preprocess()` and `postprocess()` steps. The following score.py file uses `bert-squad` as an example and assumes the inputs will be in the following format. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs_json = {\n",
    "  \"version\": \"1.4\",\n",
    "  \"data\": [\n",
    "    {\n",
    "      \"paragraphs\": [\n",
    "        {\n",
    "          \"context\": \"In its early years, the new convention center failed to meet attendance and revenue expectations.[12] By 2002, many Silicon Valley businesses were choosing the much larger Moscone Center in San Francisco over the San Jose Convention Center due to the latter's limited space. A ballot measure to finance an expansion via a hotel tax failed to reach the required two-thirds majority to pass. In June 2005, Team San Jose built the South Hall, a $6.77 million, blue and white tent, adding 80,000 square feet (7,400 m2) of exhibit space\",\n",
    "          \"qas\": [\n",
    "            {\n",
    "              \"question\": \"where is the businesses choosing to go?\",\n",
    "              \"id\": \"1\"\n",
    "            },\n",
    "            {\n",
    "              \"question\": \"how may votes did the ballot measure need?\",\n",
    "              \"id\": \"2\"\n",
    "            },\n",
    "            {\n",
    "              \"question\": \"When did businesses choose Moscone Center?\",\n",
    "              \"id\": \"3\"\n",
    "            }\n",
    "          ]\n",
    "        }\n",
    "      ],\n",
    "      \"title\": \"Conference Center\"\n",
    "    }\n",
    "  ]\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile score.py\n",
    "import os\n",
    "import collections\n",
    "import json\n",
    "import time\n",
    "from azureml.core.model import Model\n",
    "import numpy as np    # we're going to use numpy to process input and output data\n",
    "import onnxruntime    # to inference ONNX models, we use the ONNX Runtime\n",
    "import wget\n",
    "from pytorch_pretrained_bert.tokenization import whitespace_tokenize, BasicTokenizer, BertTokenizer\n",
    "\n",
    "def init():\n",
    "    global session, tokenizer\n",
    "    # use AZUREML_MODEL_DIR to get your deployed model(s). If multiple models are deployed, \n",
    "    # model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), '$MODEL_NAME/$VERSION/$MODEL_FILE_NAME')\n",
    "    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'bert_azureml_large_uncased.onnx')\n",
    "    sess_options = onnxruntime.SessionOptions()\n",
    "    \n",
    "    # You need set environment variables like OMP_NUM_THREADS for OpenMP to get best performance.\n",
    "    # See https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/bert/notebooks/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb\n",
    "    sess_options.intra_op_num_threads = 1\n",
    "    \n",
    "    session = onnxruntime.InferenceSession(model_path, sess_options)\n",
    "    \n",
    "    tokenizer = BertTokenizer.from_pretrained(\"bert-large-uncased\", do_lower_case=True)\n",
    "    \n",
    "    # download run_squad.py and tokenization.py from \n",
    "    # https://github.com/onnx/models/tree/master/text/machine_comprehension/bert-squad to \n",
    "    # help with preprocessing and post-processing. \n",
    "    if not os.path.exists('./run_onnx_squad.py'):\n",
    "        url = \"https://raw.githubusercontent.com/onnx/models/master/text/machine_comprehension/bert-squad/dependencies/run_onnx_squad.py\"\n",
    "        wget.download(url, './run_onnx_squad.py')\n",
    "\n",
    "    if not os.path.exists('./tokenization.py'):\n",
    "        url = \"https://raw.githubusercontent.com/onnx/models/master/text/machine_comprehension/bert-squad/dependencies/tokenization.py\"\n",
    "        wget.download(url, './tokenization.py')\n",
    "\n",
    "def preprocess(input_data_json):\n",
    "    \n",
    "    global all_examples, extra_data\n",
    "    \n",
    "    # Model configs. Adjust as needed.\n",
    "    max_seq_length = 128\n",
    "    doc_stride = 128\n",
    "    max_query_length = 64\n",
    "\n",
    "    # Write the input json to file to be used by read_squad_examples()\n",
    "    input_data_file = \"input.json\"\n",
    "    with open(input_data_file, 'w') as outfile:\n",
    "        json.dump(json.loads(input_data_json), outfile)\n",
    "    \n",
    "    from run_onnx_squad import read_squad_examples, convert_examples_to_features\n",
    "    # Use read_squad_examples method from run_onnx_squad to read the input file\n",
    "    all_examples = read_squad_examples(input_file=input_data_file)\n",
    "    \n",
    "    \n",
    "\n",
    "    # Use convert_examples_to_features method from run_onnx_squad to get parameters from the input \n",
    "    input_ids, input_mask, segment_ids, extra_data = convert_examples_to_features(all_examples, tokenizer,\n",
    "                                                                              max_seq_length, doc_stride, max_query_length)\n",
    "    return input_ids, input_mask, segment_ids\n",
    "\n",
    "def postprocess(all_results):\n",
    "    # postprocess results\n",
    "    from run_onnx_squad import write_predictions\n",
    "\n",
    "    n_best_size = 20\n",
    "    max_answer_length = 30\n",
    "    output_dir = 'predictions'\n",
    "    os.makedirs(output_dir, exist_ok=True)\n",
    "    output_prediction_file = os.path.join(output_dir, \"predictions.json\")\n",
    "    output_nbest_file = os.path.join(output_dir, \"nbest_predictions.json\")\n",
    "    # Write the predictions (answers to the questions) in a file.\n",
    "    write_predictions(all_examples, extra_data, all_results,\n",
    "                    n_best_size, max_answer_length,\n",
    "                    True, output_prediction_file, output_nbest_file)\n",
    "    # Retrieve best results from file.\n",
    "    result = {}\n",
    "    with open(output_prediction_file, \"r\") as f:\n",
    "        result = json.load(f)\n",
    "    return result\n",
    "\n",
    "def run(input_data_json):\n",
    "    try:\n",
    "        # load in our data\n",
    "        input_ids, input_mask, segment_ids = preprocess(input_data_json)\n",
    "        RawResult = collections.namedtuple(\"RawResult\", [\"unique_id\", \"start_logits\", \"end_logits\"])\n",
    "        \n",
    "        n = len(input_ids)\n",
    "        bs = 1\n",
    "        all_results = []\n",
    "        start = time.time()\n",
    "        for idx in range(0, n):\n",
    "            item = all_examples[idx]\n",
    "            # this is using batch_size=1\n",
    "            # feed the input data as int64\n",
    "            data = {\n",
    "                    \"segment_ids\": segment_ids[idx:idx+bs],\n",
    "                    \"input_ids\": input_ids[idx:idx+bs],\n",
    "                    \"input_mask\": input_mask[idx:idx+bs]\n",
    "                    }\n",
    "            result = session.run([\"start\", \"end\"], data)\n",
    "            in_batch = result[0].shape[0]\n",
    "            start_logits = [float(x) for x in result[1][0].flat]\n",
    "            end_logits = [float(x) for x in result[0][0].flat]\n",
    "            for i in range(0, in_batch):\n",
    "                unique_id = len(all_results)\n",
    "                all_results.append(RawResult(unique_id=unique_id, start_logits=start_logits, end_logits=end_logits))\n",
    "                \n",
    "        end = time.time()\n",
    "        print(\"total time: {}sec, {}sec per item\".format(end - start, (end - start) / len(all_results)))\n",
    "        return {\"result\": postprocess(all_results),\n",
    "                \"total_time\": end - start, \n",
    "               \"time_per_item\": (end - start) / len(all_results)}\n",
    "    except Exception as e:\n",
    "        result = str(e)\n",
    "        return {\"error\": result}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2.3 - Write Environment File\n",
    "\n",
    "We create a YAML file that specifies which dependencies we would like to see in our container."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.conda_dependencies import CondaDependencies \n",
    "\n",
    "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\", \"azureml-defaults\", \"tensorflow\", \"wget\", \"pytorch_pretrained_bert\"])\n",
    "\n",
    "with open(\"myenv.yml\",\"w\") as f:\n",
    "    f.write(myenv.serialize_to_string())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're all set! Let's get our model chugging.\n",
    "\n",
    "## Step 2.4 - Deploy Model as Webservice on Azure Container Instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.webservice import AciWebservice\n",
    "from azureml.core.model import InferenceConfig\n",
    "from azureml.core.environment import Environment\n",
    "\n",
    "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
    "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
    "\n",
    "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
    "                                               memory_gb = 4, \n",
    "                                               tags = {'demo': 'onnx'}, \n",
    "                                               description = 'web service for Bert-squad-large-uncased ONNX model')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following cell will likely take a few minutes to run as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.webservice import Webservice\n",
    "from random import randint\n",
    "\n",
    "aci_service_name = 'onnx-bert-squad-large-uncased-'+str(randint(0,100))\n",
    "print(\"Service\", aci_service_name)\n",
    "\n",
    "aci_service = Model.deploy(ws, \n",
    "                           aci_service_name, \n",
    "                           [model], \n",
    "                           inference_config, \n",
    "                           aciconfig)\n",
    "\n",
    "aci_service.wait_for_deployment(True)\n",
    "print(aci_service.state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if aci_service.state != 'Healthy':\n",
    "    # run this command for debugging.\n",
    "    print(aci_service.get_logs())\n",
    "    aci_service.delete()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Success!\n",
    "\n",
    "If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(aci_service.scoring_uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2.5 - Inference Bert Model using our WebService\n",
    "\n",
    "**Input**: Context paragraph and questions as formatted in `inputs.json`\n",
    "\n",
    "**Task**: For each question about the context paragraph, the model predicts a start and an end token from the paragraph that most likely answers the questions.\n",
    "\n",
    "**Output**: The best answer for each question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use the inputs from step 2.2\n",
    "print(\"========= INPUT DATA =========\")\n",
    "print(json.dumps(inputs_json, indent=2))\n",
    "azure_result = aci_service.run(json.dumps(inputs_json))\n",
    "print(\"\\n\")\n",
    "print(\"========= RESULT =========\")\n",
    "print(json.dumps(azure_result, indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res = azure_result['result']\n",
    "inference_time = np.round(azure_result['total_time'] * 1000, 2)\n",
    "time_per_item = np.round(azure_result['time_per_item'] * 1000, 2)\n",
    "\n",
    "print('========================================')\n",
    "print('Final predictions are: ')\n",
    "for key in res:\n",
    "    print(\"Question: \", inputs_json['data'][0]['paragraphs'][0]['qas'][int(key) - 1]['question'])\n",
    "    print(\"Best Answer: \", res[key])\n",
    "    print()\n",
    "\n",
    "print('========================================')\n",
    "print('Inference time: ' + str(inference_time) + \" ms\")\n",
    "print('Average inference time for each question: ' + str(time_per_item) + \" ms\")\n",
    "print('========================================')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you are eventually done using the web service, remember to delete it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aci_service.delete()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
